Use multi-stage build for processing dependencies#420
Conversation
|
I'd like to avoid patching in dependencies whenever possible. From what I understand here, tesseract running locally relies on certain dependencies only available in Ubuntu 20.04, the last time the local code was compiled. I think the better move here- and which actually matches what production does, is to make tesseract self contained when running locally. I have a branch that does this here: |
|
I think these two approaches are actually similar - one is doing a multistage docker build in order to grab the necessary libraries from the Ubuntu 20.04 image, while the other is just checking those libraries directly into git. |
This fixes a problem I was having locally where OCR and redaction failed because of missing binaries. More here: https://gist.github.com/eyeseast/434734582ae07ee5845abe968f7fc108
We now use a multi-stage build to compile dependencies in Ubuntu and then copy them into our
python:3.12-slimimage, so the processing scripts can find them.This only affects local development, not production.